Back

Molecular & Cellular Proteomics

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Molecular & Cellular Proteomics's content profile, based on 158 papers previously published here. The average preprint has a 0.10% match score for this journal, so anything above that is already an above-average fit.

1
Distinct Metabolic Signatures Distinguish Lung, Colorectal and Ovarian Cancer

Tsiara, I.; Vouzaxaki, E.; Ekström, J.; Rameika, N.; Yang, F.; Jain, A.; Iglesias Alonso, A.; Sjöblom, T.; Globisch, D.

2026-04-13 oncology 10.64898/2026.04.08.26350309 medRxiv
Top 0.8%
2.2%
Show abstract

Cancer-related casualties are the most common cause of death worldwide. The discovery of biomarkers is of utmost importance for diagnosis and disease monitoring. Herein, we performed a comprehensive metabolomics biomarker discovery effort in plasma from 615 lung, ovarian and colorectal cancer patients at diagnosis and 95 non-cancerous control subjects. This pan-cancer investigation identified specific panels of metabolites in the entire sample cohort with a high discriminating power and demonstrated by combined ROC AUC values of up to 0.95. The identified metabolites are mainly associated with lipid and amino acid metabolism as well as xenobiotic transformation. These metabolite panels of high predictive power provide new metabolic insights in these cancers and demonstrate the potential of metabolomics for improved diagnosis and monitoring disease progression.

2
A Tale of Two Countries: Comparison of Rectal Cancer Characteristics Between Pakistani Americans and Native Pakistanis

Sherwani, M.; Azhar, M. K.; Khan, S.; Ali, D.; Husain, S.; Khan, A.

2026-04-11 surgery 10.64898/2026.04.07.26350364 medRxiv
Top 1%
1.6%
Show abstract

IntroductionComparison of rectal cancer characteristics in Pakistani Americans and native Pakistanis remains poorly investigated, as migrant studies have predominantly concentrated on East and Southeast Asian groups. This research aims to compare clinicopathological characteristics between the two groups. We hypothesize that significant differences will exist between these cohorts, mediated by gene-environment interactions. MethodsThis was a retrospective cohort study utilizing two multi-institutional databases to identify adult patients with rectal cancer: the National Cancer Database in the U.S (2018-2022) and the Rectal Cancer Surgery and Epidemiology Study in Pakistan (2020-2021). Non-Hispanic Whites (NHWs) were included as a reference population for comparative analysis. Clinicopathological characteristics were compared using Wilcoxon rank-sum and chi-square tests. ResultsA total of 523 Pakistani Americans and 608 native Pakistanis were included in the study. The median age at diagnosis was 57 years in Pakistani Americans (IQR 48-68), 42 years (IQR 33-54) in native Pakistanis and 63 years in NHWs (IQR 54-73) (p < 0.001). Native Pakistanis presented with early-stage disease less often than Pakistani Americans and NHWs (5.3%, 25.1%, and 20.5%, respectively; p < 0.001) and had markedly higher rates of signet cell carcinoma (20.1%, 0.6%, and 0.4%, respectively; p < 0.001) and poorly differentiated tumors (29.0%, 10.4%, and 11.4%, respectively; p < 0.001). ConclusionsThis study found that Native Pakistanis with rectal cancer presented at a younger age and with more aggressive tumor characteristics compared to both Pakistani Americans and NHWs. Notably, Pakistani Americans displayed a distinct clinical profile, intermediate between both groups.

3
Imaging Mass Cytometry (IMC) as a Tool to Characterize Circulating Tumor Cells (CTCs) in Preclinical Mouse Models

Pore, M.; Balamurugan, K.; Atkinson, A.; Breen, D.; Mallory, P.; Cardamone, A.; McKennett, L.; Newkirk, C.; Sharan, S.; Bocik, W.; Sterneck, E.

2026-04-16 cancer biology 10.64898/2025.12.18.695262 medRxiv
Top 1%
1.5%
Show abstract

Circulating tumor cells (CTCs), and especially CTC-clusters, are linked to poor prognosis and may reveal mechanisms of metastasis and treatment resistance. Therefore, developing unbiased methods for the functional characterization of CTCs in liquid biopsies is an urgent need. Here, we present an evaluation of multiplex imaging mass cytometry (IMC) to analyze CTCs in mice with human xenograft tumors. In a single-step process, IMC uses metal-labeled antibodies to simultaneously detect a large number of proteins/modifications within minimally manipulated small volumes of blood from the tail vein or heart. We used breast cancer cell lines and a patient-derived xenograft (PDX) to assess antibodies for cross-species interpretation. Along with manual verification, HALO-AI-based cell segmentation was used to identify CTCs and quantify markers. Despite some limitations regarding human-specificity, this technology can be used to investigate the effect of genetic and pharmacological interventions on the properties of single and cluster CTCs in tumor-bearing mice.

4
A clinicoradiological model for preoperative prediction of lateral lymph node metastasis in rectal cancer

Shen, Q.; Wang, G.; Fu, M.; Yao, K.; Yang, Y.; Zeng, Q.; Guo, Y.

2026-04-15 gastroenterology 10.64898/2026.04.13.26350816 medRxiv
Top 2%
0.9%
Show abstract

Background: Lateral lymph node metastasis (LLNM) is associated with poor prognosis in patients with rectal cancer and may influence the indication for lateral lymph node dissection. Accurate preoperative identification of LLNM remains challenging. This study aimed to develop and internally validate a clinicoradiological model for preoperative prediction of LLNM in rectal cancer. Methods A retrospective cohort of 64 patients undergoing lateral lymph node dissection (LLND) for rectal cancer was analysed; 21 (32.8%) had pathological lateral lymph node metastasis (LLNM). A prespecified preoperative clinicoradiological model was fitted using penalised logistic regression with L2 regularisation (ridge), incorporating MRI-measured lateral lymph node short-axis diameter (LLN-SAD), dichotomised clinical T stage (T3-4 vs T1-2), dichotomised clinical N stage (N+ vs N0), and log(CA19-9+1). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration analysis, and bootstrap internal validation. Results The model showed good discrimination (AUC 0.914), with an optimism-corrected AUC of 0.887 on bootstrap validation. Calibration remained acceptable after optimism correction (calibration intercept -0.127; slope 1.045). Decision curve analysis suggested net benefit across clinically relevant threshold probabilities, particularly between 0.10 and 0.30. The model was implemented as a web-based calculator to facilitate clinical use. Conclusion This clinicoradiological model showed good discrimination, acceptable calibration, and potential clinical utility for preoperative assessment of LLNM risk in rectal cancer. It may assist individualized risk stratification and treatment planning, although external validation is required before routine clinical implementation.

5
LRRK2 mutations block NCOA4 trafficking upon iron overload leading to ferroptotic death

Goldman, A.; Nguyen, M.; Lanoix, J.; Li, C.; Fahmy, A.; Zhong Xu, Y.; Schurr, E.; Thibault, P.; Desjardins, M.; McBride, H.

2026-04-17 cell biology 10.1101/2025.08.25.672135 medRxiv
Top 2%
0.5%
Show abstract

Altered iron homeostasis has long been implicated in Parkinson's Disease (PD), although the mechanisms have not been clear. Given the critical role of PD-related activating mutations in LRRK2 (leucine-rich repeat protein kinase 2) within membrane trafficking pathways we examined the impact of a homozygous mutant LRRK2G2019S on iron homeostasis within the RAW macrophage cell line with high iron capacity. Proteomics analysis revealed a dysregulation of iron-related proteins in steady state with highly elevated levels of ferritin light chain and a reduction of ferritin heavy chain. LRRK2G2019S mutant cells showed efficient ferritinophagy upon iron chelation, but upon iron overload there was a near complete block in the degradation of the ferritinophagy adaptor NCOA4. These conditions lead to an accumulation of phosphorylated Rab8 at the plasma membrane, which is selectively inhibited by LRRK type II kinase inhibitors. Iron overload then leads to increased oxidative stress and ferroptotic cell death. These data implicate LRRK2 as a key regulator of iron homeostasis and point to the need for an increased focus on the mechanisms of iron dysregulation in PD.

6
A conserved grain-associated immunosuppressive niche in Sudanese patients with mycetoma.

Osman, M.; Ashwin, H.; Calder, G.; O'Toole, P.; Bakhiet, S. M.; Musa, A. M.; Kaye, P. M.; Fahal, A. H.

2026-04-13 infectious diseases 10.64898/2026.04.09.26350374 medRxiv
Top 2%
0.5%
Show abstract

Mycetoma is a neglected tropical disease caused by various bacterial and fungal pathogens that has a significant health impact across a broad geographically defined "mycetoma belt" spanning South America, Africa and Asia. Histologically, mycetoma is characterised by invasive and destructive granuloma development in the skin, deep tissues and bone, leading to tissue destruction, deformities and high morbidity. The presence of macroscopic, highly compacted pathogen microcolonies, or "grains," is a key diagnostic feature, and the formation of grains supports pathogen persistence and disease chronicity. However, there is a paucity of information on immune responses in mycetoma patients and on the relative importance of phylogeny and/or grains in establishing the local immune landscape. Here, we used spatial proteomics to examine the distribution of 43 immune-related proteins in surgical biopsies from 11 patients with mycetoma of bacterial (Actinomycetoma; Actinomadura pelletierii and Streptomyces somaliensis; n=6) and fungal (Eumycetoma; Madurella mycetomatis; n=5) origin. Using mixed-effects modelling, an exploratory analysis across species and pathogen classes revealed few significant differences in immune marker expression. In contrast, and independently of pathogen class, the cellular infiltrate closest to grain boundaries had higher per-cell expression of CD66b+, ARG1, and VISTA. The preferential accumulation of CD66b+ARG1+VISTA+ cells at grain boundaries was confirmed by quantitative immunofluorescence analysis. Hence, the local tissue microenvironment surrounding the mycetoma grain represents a specialised immunosuppressive niche, with parallels to the tumour microenvironment.

7
Characterization of a pancreatic cancer GWAS signal suggests PDX1 buffers stress in the exocrine pancreas

Hoskins, J. W.; Christensen, T. A.; Eiser, D.; Char, E.; Mobaraki, M.; O'Brien, A.; Collins, I.; Zhong, J.; Patel, M. B.; Prasad, G.; Pancreatic Cancer Cohort Consortium and Pancreatic Cancer Case-Control Consortium (PanScan/PanC4), ; Arda, E.; Connelly, K. E.; Amundadottir, L. T.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.13.26350790 medRxiv
Top 2%
0.5%
Show abstract

Pancreatic ductal adenocarcinoma (PDAC) remains one of the deadliest human cancers. The current largest published PDAC Genome-Wide Association Study (GWAS) identified 23 genetic risk signals, but most lack sufficient characterization. This study aimed to functionally characterize the chr13q12.2 (PLUT/PDX1) PDAC GWAS risk locus. Fine-mapping, luciferase reporter assays, and electrophoretic mobility shift assays implicated rs9581943, a PDX1 promoter SNP, as a functional variant underlying this GWAS signal. GTEx expression QTL analyses identified rs9581943 as a significant PDX1 eQTL in pancreas, and CRISPR/Cas9 editing in PDAC-derived cell lines confirmed a functional relationship. PDX1 is a transcription factor involved in early pancreas development and {beta}-cell homeostasis, but its role in exocrine pancreatic cells is unclear. Single-nucleus RNA-seq analyses of pancreatic acinar and ductal cells from neonatal, adult, and chronic pancreatitis donors suggested PDX1 activity alleviates high secretory load and ER-stress in acinar and biases ducts toward homeostatic phenotypes. Similarly, scRNA-seq analyses of pancreatic tumors suggested PDX1 activity reduces biosynthetic and inflammatory stress and promotes epithelial differentiation. Our study therefore implicates rs9581943 as a causal variant for the chr13q12.2 PDAC GWAS signal wherein the risk allele reduces PDX1 expression, eroding PDX1's capacity to buffer stress and stabilize epithelial cell fate in the exocrine compartment.

8
A Multi-Cohort Study of Immunoglobulin G Glycans in Newly Diagnosed Inflammatory Bowel Disease Patients Reveals Accelerated Biological Aging

Flevaris, K.; Trbojevic-Akmacic, I.; Goh, D.; Lalli, J. S.; Vuckovic, F.; Capin Vilaj, M.; Stambuk, J.; Kristic, J.; Mijakovac, A.; Ventham, N.; Kalla, R.; Latiano, A.; Manetti, N.; Li, D.; McGovern, D. P. B.; Kennedy, N. A.; Annese, V.; Lauc, G.; Satsangi, J.; Kontoravdi, C.

2026-04-11 gastroenterology 10.64898/2026.04.10.26349930 medRxiv
Top 3%
0.4%
Show abstract

Background and Aims: Alterations in immunoglobulin G (IgG) N-glycosylation are implicated in inflammatory bowel disease (IBD); however, the robustness of IgG glycan signatures across IBD cohorts with diverse demographics and geographic origins remains underexplored. We aimed to determine whether compositional data analysis (CoDA) and machine learning (ML) can identify IBD-related IgG N-glycan signatures and whether these signatures capture disease-associated acceleration of biological aging. Methods: We analyzed the IgG glycome profiles of 1,367 plasma samples collected from healthy controls (HC), symptomatic controls (SC), and people with newly diagnosed Crohn's (CD), and ulcerative colitis (UC) across four cohorts (UK, Italy, United States, and Netherlands). IgG glycosylation was analyzed by ultra-high-performance liquid chromatography, yielding 24 total-area-normalized glycan peaks (GPs). Analyses were performed using cross-sectional data obtained at baseline. CoDA-powered association analyses were used to identify disease-related effects on GPs while controlling for demographic covariates. ML models were trained and evaluated to assess generalizability to unseen cohorts and demographic subgroups, with a focus on discrimination and reliability. Results: Across all cohorts, people with IBD demonstrated accelerated biological aging as quantified by the GlycanAge index. This was accompanied by consistent reductions in IgG galactosylation, with effects partially modulated by age. Classification models trained on glycomics and demographics achieved robust discrimination (AUROC~0.80) between non-IBD (HC+SC) and IBD across cohorts. Conclusion: These findings reveal accelerated biological aging in people with IBD and support the translational potential of IgG glycans as biomarkers and a novel route toward clinically interpretable personalized risk estimates.

9
VAE (Variational Autoencoder) Based Gastrotype Identification and Predictive Diagnosis of Helicobacter pylori Infection

Ma, Z.; Qiao, Y.

2026-04-13 gastroenterology 10.64898/2026.04.11.26350690 medRxiv
Top 11%
0.1%
Show abstract

Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.

10
Sex-Stratified Multi-Omics Identifies Sexually Dimorphic Molecular Targets in Parkinsons Disease

Lee, J.-Y.; Lee, J.; Lee, S.; Yoon, J. H.; Park, D. G.; Sung, J.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.10.26350571 medRxiv
Top 12%
0.0%
Show abstract

Parkinsons disease (PD) exhibits well-established sex differences in prevalence and clinical phenotypes, yet the underlying molecular mechanisms remain largely elusive. Here, we conducted a comprehensive sex-stratified multi-omic integration to identify sex-specific causal proteins and biological pathways in PD. We performed gene-based association analysis, transcriptome-wide association studies (TWAS), and proteome-wide Mendelian randomization (PWMR) with colocalization analysis using GWAS summary statistics from the International PD Genetics Consortium (IPDGC; 12,054 male cases/11,999 controls; 7,384 female cases/12,389 controls) for sex-stratified analyses and Global Parkinsons Genetics Program (GP2; 34,933 cases/31,009 controls) for sex-combined analyses. Prioritized candidates were further evaluated through MR with brain expression quantitative trait loci (eQTLs) from MetaBrain and differential protein abundance analysis using the Global Neurodegeneration Proteomics Consortium (GNPC; 704 PD cases/5,629 controls in plasma; 78 cases/1,411 controls in cerebrospinal fluid). Additionally, pathway enrichment analysis was performed for prioritized molecules. Integration across three analytical layers prioritized 102 molecular candidates across 31 unique loci, significant from multiple analyses. Of these, eleven genes reached significance across all three layers, including SNCA, MAPT, and CTSB significant in both sexes; CD160, GPNMB, and LRRC37A2 as male-predominant; STX4 and PRSS53 as female-predominant; and BST1, SCARB2, and LGALS3 significant only in sex-combined analysis. In males, CD160 emerged as a novel candidate with convergent evidence across all three analyses and colocalization, while L3MBTL2 was identified as a novel risk gene from gene-based association and TWAS analyses. In females, STX4 and PRSS53 at the 16p11.2 locus showed female-predominant associations. Pathway enrichment analysis revealed innate immune and SUMOylation pathways in males, with CD160 and L3MBTL2 as key contributors respectively, contrasting with WDR5-mediated chromatin remodeling in females. Brain eQTL-based MR confirmed significant associations for 69 of 86 testable candidates (80.2%) in at least one tissue. Protein abundance analysis confirmed sex-specific patterns, and several candidates showed discordant directions between genetically predicted causal effects and observed protein abundance -- including male-specific plasma elevation of CD160 and female-specific patterns for STX4 -- underscoring the distinction between causal risk mechanisms and disease-state molecular changes. These findings demonstrate that PD is a molecularly heterogeneous disorder with sexually dimorphic pathogenic drivers. While shared axes such as lysosomal dysfunction and vesicle trafficking disruption exist, the divergence into male-specific immune dysregulation and female-specific chromatin remodeling suggests that the primary triggers of neurodegeneration differ by sex. Our results underscore the necessity of sex-stratified approaches in biomarker discovery and the development of precision therapeutic strategies for PD.

11
GPR143, a novel immunohistochemical marker for renal tumors with FLCN/TSC/MTOR-TFE alterations

Li, Q.; Singh, A.; Hu, R.; Huang, W.; Shapiro, D. D.; Abel, E. J.; Zong, Y.

2026-04-13 pathology 10.64898/2026.04.06.26350070 medRxiv
Top 12%
0.0%
Show abstract

Although several ancillary tests are available in limited laboratories, diagnosis of microphthalmia (MiT)/TFE family translocation renal cell carcinoma (tRCC) could be challenging due to diverse and overlapping tumor morphology and the lack of reliable biomarkers. GPNMB has been recently identified as a diagnostic marker for various renal neoplasms with FLCN/TSC/mTOR-TFE alterations. However, the sensitivity and specificity of GPNMB immunostain are suboptimal and the result interpretation in ambiguous cases could be difficult. To search additional biomarkers that could improve the screening sensitivity and predict genetic aberrations in FLCN/TSC/mTOR-TFE pathway in renal tumors, we performed bioinformatic analysis of publicly available cancer databases and found GPR143, a transmembrane protein regulated by MiT transcription factors, was highly expressed in a subset of renal cell carcinomas (RCCs). In two the Cancer Genome Atlas (TCGA) kidney cancer cohorts, RCCs with high levels of GPR143 expression were enriched for renal neoplasms with FLCN/TSC/mTOR-TFE alterations. Similar to GPNMB labeling, GPR143 immunostain was positive in the majority of tRCC cases and renal tumors with FLCN/TSC/mTOR alterations, suggesting that GPR143 could function as another surrogate marker for FLCN/TSC/mTOR-TFE alterations in certain renal tumors. Interestingly, despite the concordant GPR143 and GPNMB immunoreactivity in most renal neoplasms with FLCN/TSC/mTOR-TFE alterations, diffuse GPR143 immunostain was observed in some cases with negative or focal GPNMB labeling. Taken together, our results indicate GPR143 could serve as a useful adjunct marker to improve the sensitivity for screening renal tumors with FLCN/TSC/mTOR-TFE alterations.

12
A safer fluorescent in situ hybridization protocol for cryosections

Chihara, A.; Mizuno, R.; Kagawa, N.; Takayama, A.; Okumura, A.; Suzuki, M.; Shibata, Y.; Mochii, M.; Ohuchi, H.; Sato, K.; Suzuki, K.-i. T.

2026-04-16 molecular biology 10.1101/2025.05.25.655994 medRxiv
Top 13%
0.0%
Show abstract

Fluorescent in situ hybridization (FISH) enables highly sensitive, high-resolution detection of gene transcripts. Moreover, by employing multiple probes, this technique allows for multiplexed, simultaneous detection of distinct gene expression patterns spatiotemporally, making it a valuable spatial transcriptomics approach. Owing to these advantages, FISH techniques are rapidly being adopted across diverse areas of basic biology. However, conventional protocols often rely on volatile, toxic reagents such as formalin or methanol, posing potential health risks to researchers. Here, we present a safer protocol that replaces these chemicals with low-toxicity alternatives, without compromising the high detection sensitivity of FISH. We validated this protocol using both in situ hybridization chain reaction (HCR) and signal amplification by exchange reaction (SABER)-FISH in frozen sections of various model organisms, including mouse (Mus musculus), amphibians (Xenopus laevis and Pleurodeles waltl), and medaka (Oryzias latipes). Our results demonstrate successful multiplexed detection of morphogenetic and cell-type marker genes in these model animals using this safer protocol. The protocol has the additional advantage of requiring no proteolytic enzyme treatment, thus preserving tissue integrity. Furthermore, we show that this protocol is fully compatible with EGFP immunostaining, allowing for the simultaneous detection of mRNAs and reporter proteins in transgenic animals. This protocol retains the benefits of highly sensitive, multiplexed, and multimodal detection afforded by integrating in situ HCR and SABER-FISH with immunohistochemistry, while providing a safer option for researchers, thereby offering a valuable tool for basic biology.

13
Virtual Spectral Decomposition with Dendritic Tile Selection: An Explainable AI Framework for Multimodal Tissue Composition Analysis and Immune Phenotyping Across Pancreatic, Lung, and Breast Cancer

Chandra, S.

2026-04-13 oncology 10.64898/2026.04.11.26350689 medRxiv
Top 15%
0.0%
Show abstract

Background: Current deep learning models in computational pathology, radiology, and digital pathology produce opaque predictions that lack the explainable artificial intelligence (xAI) capabilities required for clinical adoption. Despite achieving radiologist-level performance in tasks from whole-slide image (WSI) classification to mammographic screening, these models function as black boxes: clinicians cannot trace predictions to specific biological features, verify outputs against established morphological criteria, or integrate AI reasoning into precision oncology workflows and tumor board decision-making. Methods: We present Virtual Spectral Decomposition (VSD), a modality-agnostic, interpretable-by-design framework that decomposes medical images into six biologically interpretable tissue composition channels using sigmoid threshold functions - the same mathematical structure as CT windowing. Unlike post-hoc xAI methods (Grad-CAM, SHAP, LIME) applied to black-box deep learning models, VSD channels have pre-defined biological meanings derived from tissue physics, providing inherent explainability without sacrificing quantitative rigor. For whole-slide image (WSI) analysis in digital pathology, we introduce the dendritic tile selection algorithm, a biologically-inspired hierarchical architecture achieving 70-80% computational reduction while preferentially sampling the tumor immune microenvironment. VSD is validated across three cancer types and imaging modalities: pancreatic ductal adenocarcinoma (PDAC) on CT imaging, lung adenocarcinoma (LUAD) on H&E-stained pathology slides using TCGA data, and breast cancer on screening mammography. Composition entropy of the six-channel vector is computed as a visual Biological Entropy Index (vBEI) - an imaging biomarker quantifying the diversity of active biological defense systems. Results: In pancreatic cancer, the fat-to-stroma ratio (a novel CT-derived radiomics biomarker) declines from >5.0 (normal) to <0.5 (advanced PDAC), enabling early detection of desmoplastic invasion before mass formation on standard imaging. In lung cancer, composition entropy from H&E whole-slide images correlates with tumor immune microenvironment markers from RNA-seq (CD3: rho=+0.57, p=0.009; CD8: rho=+0.54, p=0.015; PD-1: rho=+0.54, p=0.013) and predicts overall survival (low entropy immune-desert phenotype: 71% mortality vs 29%, p=0.032; n=20 TCGA-LUAD), providing immune phenotyping for checkpoint immunotherapy patient selection from a $5 H&E slide without molecular assays. In breast cancer, each lesion type produces a characteristic six-channel fingerprint functioning as an interpretable computer-aided diagnosis (CAD) system for quantitative BI-RADS assessment and subtype classification (IDC vs ILC vs DCIS vs IBC). A five-level xAI audit trail provides complete traceability from clinical decision support output to specific biological structures visible on the original images. Conclusion: VSD establishes a unified, interpretable-by-design mathematical framework for explainable tissue composition analysis across imaging modalities and cancer types. Unlike black-box deep learning and post-hoc xAI approaches, VSD provides inherently interpretable, clinically verifiable cancer detection and immune phenotyping from standard clinical imaging at existing costs - without requiring foundation model infrastructure, specialized hardware, or molecular assays. The open-source pipeline (Google Colab, Supplementary Material) enables immediate reproducibility and extension to additional cancer types across the pan-cancer TCGA atlas.

14
Identification, evolutionary history and characteristics of orphan genes in root-knot nematodes

Seckin, E.; Colinet, D.; Bailly-Bechet, M.; Seassau, A.; Bottini, S.; Sarti, E.; Danchin, E. G.

2026-04-11 bioinformatics 10.64898/2025.12.19.695360 medRxiv
Top 15%
0.0%
Show abstract

Orphan genes, lacking homologs in other species, are systematically found across genomes. Their presence may result from extensive divergence from pre-existing genes or from de novo gene birth, which occurs when a gene emerges from a previously non-genic region. In this study, we identified orphan genes in the genomes of globally distributed plant-parasitic nematodes of the genus Meloidogyne and investigated their origins, evolution, and characteristics. Using a comparative genomics framework across 85 nematode species, we found that 18% of Meloidogyne genes are genus-specific, transcriptionally supported orphans. By combining ancestral sequence reconstruction and synteny-based approaches, we inferred that 20% of these orphan genes originated through high divergence, while 18% likely emerged de novo. Proteomic and translatomic evidence confirmed the translation of a subset of these genes, and feature analyses revealed distinctive molecular signatures, including shorter length, signal peptide enrichment, and a tendency for extracellular localization. These findings highlight orphan genes as a substantial and previously underexplored component of the Meloidogyne genome, with potential roles in their worldwide parasitism.

15
Single-molecule cfDNA sequencing establishes clinical utility for ecDNA monitoring and multimodal liquid biopsy analysis

Sauer, C. M.; Tovey, N.; Ptasinska, A.; Hughes, D.; Stockton, J.; Zumalave, S.; Rust, A. G.; Lynn, C.; Livellara, V.; Sevrin, F.; Himsworth, C.; Muyas, F.; Nicolaidou, M.; Parry, G.; Paisana, E.; Cascao, R.; Ahmed, S. W.; Yasin, S. A.; Portela, L. R.; Balasubramanian, P.; Burke, G. A. A.; Vedi, A.; Faria, C. C.; Marshall, L. V.; Jacques, T. S.; Hubank, M.; Hargrave, D.; George, S.; Angelini, P.; Anderson, J.; Chesler, L.; Beggs, A. D.; Cortes-Ciriano, I.

2026-04-12 oncology 10.64898/2026.04.08.26350410 medRxiv
Top 16%
0.0%
Show abstract

Cell-free DNA (cfDNA) profiling enables minimally invasive cancer detection and monitoring. We present SIMMA, a low-input single-molecule sequencing approach that enables multimodal whole-genome and high-depth targeted sequencing of the same cfDNA sample for both tumour-agnostic and tumour-informed liquid biopsy analysis. Across 792 plasma and cerebrospinal fluid cfDNA samples from 277 paediatric patients with diverse brain and extracranial tumours, SIMMA enabled tumour diagnosis, detection of driver mutations, and reconstruction of extrachromosomal DNA (ecDNA) months before clinical relapse. Using conformal prediction trained on genome-wide fragmentomics, genomic and epigenomic data, SIMMA predicts disease burden as a continuous variable and provides well-calibrated uncertainty estimates for each sample, achieving a limit of detection of [~]100 ppm from low-pass whole-genome sequencing data. In summary, SIMMA establishes the clinical utility of multimodal cfDNA profiling with uncertainty quantification for individual patients and unlocks the potential of ecDNA as a liquid biopsy biomarker for disease detection and monitoring across diverse aggressive malignancies.

16
Wearable-derived physiological features for trans-diagnostic disease comparison and classification in the All of Us longitudinal real-world dataset

Huang, X.; Hsieh, C.; Nguyen, Q.; Renteria, M. E.; Gharahkhani, P.

2026-04-13 epidemiology 10.64898/2026.04.07.26350352 medRxiv
Top 17%
0.0%
Show abstract

Wearable-derived physiological features have been associated with disease risk, but most current studies focus on single conditions, limiting understanding of cross-disease patterns. This study adopts a trans-diagnostic approach to examine whether wearable data capture shared and condition-specific physiological signatures across multiple chronic conditions spanning physical and mental health, and then evaluates the utility of these features for disease classification. A total of 9,301 patients with at least 21 days of consecutive FitBit data from the All of Us Controlled Tier Dataset version 8 were analyzed. Disease subcohorts included cardiovascular disease (CVD), diabetes, obstructive sleep apnea (OSA), major depressive disorder (MDD), anxiety, bipolar disorder, and attention-deficit/ hyperactivity disorder (ADHD), chosen based on prevalence and relevance. Logistic regression and XGBoost models were fitted for each disease subcohort versus the control cohort. We found that compared to using just baseline demographic and lifestyle features, incorporating wearable-derived features enabled improved classification performance in all subcohorts for both models, except for ADHD where improvement was mainly observed for ROC-AUC in logistic regression model likely due to the smaller sample size in ADHD subcohort. The largest performance gains were observed in MDD (increase in ROC-AUC of 0.077 for Logistic regression, 0.071 for XGBoost; p < 0.001) and anxiety (increase in ROC-AUC of 0.077 for logistic regression, 0.108 for XGBoost; p < 0.001). This study provides one of the first comprehensive transdiagnostic evaluations of wearable-derived features for disease classification, highlighting their potential to enhance risk stratification in the real-world setting as a practical complement to clinical assessments and providing a foundation to explore more fine-grained wearable data. Author summaryWearable devices such as fitness trackers and smartwatches are becoming increasingly popular and affordable, providing continuous measurements of heart rate, physical activity, and sleep. Alongside the growing digitization of health records, this creates new opportunities for large-scale, real-world health studies. In this study, we analyzed wearable-derived physiological patterns across a range of chronic conditions spanning both physical and mental health to better understand how these signals relate to disease risk. We found that incorporating wearable-derived heart rate, activity and sleep features improved disease risk classification across several conditions, with particularly strong gains for major depressive disorder and anxiety. By examining how individual features contributed to model predictions, we also identified meaningful associations between physiological signals and disease risk. For example, both duration and day-to-day variation of deep and rapid eye movement (REM) sleep were associated with increased risk in certain conditions. Our study supports the development of real-time, automated tools to assess disease risk alongside clinical care.

17
Frequency of bacterial STI testing amongst people accessing sexual health services in England, 2024: a cross-sectional analysis of national surveillance data

Baldry, G.; Harb, A.-K.; Findlater, L.; Ogaz, D.; Migchelsen, S. J.; Fifer, H.; Saunders, J.; Mohammed, H.; Sinka, K.

2026-04-13 epidemiology 10.64898/2026.04.08.26349546 medRxiv
Top 17%
0.0%
Show abstract

ObjectivesWe determined the frequency of sexually transmitted infection (STI) testing among people accessing sexual health services (SHS) in England. MethodsWe assessed STI testing frequency in face-to-face and online SHSs in England using data from the GUMCAD STI surveillance system. We quantified different combinations of tests (e.g. single chlamydia test or full STI screen), number of tests completed in 2024 and test positivity by sociodemographic and behavioural characteristics, as well as clinical setting and outcomes. ResultsOverall, there were 2,222,028 attendances at SHS in England in 2024 that involved tests for chlamydia, gonorrhoea, syphilis and/or HIV. Most of these attendances involved tests for all four of these STIs. Most people accessing SHS in England tested once (80.1%), and a small minority (1.9%) tested at least quarterly (4+ times). Some groups had a comparably larger proportion of quarterly testers; these included gay, bisexual, and other men who have sex with men (GBMSM) (6.7%), London residents (3.6%), online testers (2.5%), people using HIV-PrEP (13%), and people with 5+ partners in the previous 3 months (10.6%). Only 10.5% of GBMSM reporting higher-risk sexual behaviours tested quarterly despite recommendations for quarterly testing in this group. ConclusionsThe majority of those who tested for STIs in England in 2024 only tested once. The minority who tested at least quarterly had a higher proportion of GBMSM, people using HIV-PrEP, London residents and people reporting higher risk behaviours. Quarterly testing often appears to be aligned with current testing recommendations in England; however, we also observed that only a low proportion of behaviourally high-risk GBMSM and HIV-PrEP users are meeting these recommendations. It is important to acknowledge groups with lower or higher testing frequency when developing interventions and updating guidelines related to STI testing. WHAT IS ALREADY KNOWN ON THIS TOPICThe effectiveness of asymptomatic testing for chlamydia and gonorrhoea in gay, bisexual and other men who have sex with men (GBMSM), and the potential impact of the consequent increased antibiotic use on rising antimicrobial resistance and individual harm has recently been questioned. Testing and treatment remains a key pillar of STI prevention and management; despite this, there is limited evidence of STI testing frequency within sexual services (SHS) on a national level. WHAT THIS STUDY ADDSThis analysis shows that the majority of people attending SHSs in England in 2024 tested once, and only a small proportion of behaviourally high-risk people tested frequently. HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYAwareness of groups that are behaviourally high risk but testing infrequently is important to guide interventions and messaging regarding STI testing. The low levels of frequent testing, even among those who would be recommended quarterly testing under UK guidelines, provides important context for wider discussion around asymptomatic STI screening.

18
Wastewater detections of Bordetella pertussis and Mycobacterium tuberculosis nucleic acids in active disease outbreak sites in the USA

Paulos, A. P.; Zulli, A.; Duong, D.; Shelden, B.; White, B. J.; North, D.; Boehm, A. B.; Wolfe, M. K.

2026-04-11 public and global health 10.64898/2026.04.09.26350536 medRxiv
Top 17%
0.0%
Show abstract

Respiratory infections caused by bacterial pathogens like Mycobacterium tuberculosis and Bordetella pertussis have increased since the COVID 19 pandemic, yet clinical surveillance of both suffers from underreporting and delayed diagnoses. Wastewater monitoring is a valuable public health surveillance tool that can help fill gaps in clinical data yet has rarely been applied to respiratory bacterial pathogens despite evidence of bacterial shedding via excretion types that enter wastewater. In this study, we investigated the possibility for wastewater monitoring of two bacterial respiratory diseases, tuberculosis and pertussis, using two case studies of wastewater monitoring for M. tuberculosis and B. pertussis. We retrospectively measured concentrations of these pathogens in wastewater samples collected longitudinally from communities with and without known outbreaks of these diseases. We designed and validated a novel B. pertussis specific assay for the NAD(P) gene; B. pertussis nucleic acids were detected sporadically in wastewater during an identified outbreak. We used a highly specific, established assay for M. tuberculosis nucleic acids, and found low concentrations of the marker in wastewater that were lag-correlated with clinical incidence rates 5 weeks later. Findings support the potential of wastewater monitoring for M. tuberculosis and B. pertussis to enable identification of communities with outbreaks of tuberculosis and pertussis and provide early warning for tuberculosis.

19
Non-genetic component of height as a surrogate marker for childhood socioeconomic position and its association with cardiovascular and brain health: results from HCHS/SOL

Moon, J.-Y.; Filigrana, P.; Gallo, L. C.; Perreira, K. M.; Cai, J.; Daviglus, M.; Fernandez-Rhodes, L. E.; Garcia-Bedoya, O.; Qi, Q.; Thyagarajan, B.; Tarraf, W.; Wang, T.; Kaplan, R.; Isasi, C. R.

2026-04-13 epidemiology 10.64898/2026.04.08.26350438 medRxiv
Top 17%
0.0%
Show abstract

Childhood socioeconomic position (SEP) can have lifelong effects on health. Many studies have used adult height as a surrogate marker for early-life conditions. In this study, we derived the non-genetic component of height, calculated as the residual from sex-specific standardized height regressed on genetically predicted height, as a surrogate for childhood SEP, using data from the Hispanic Community Healthy Study/Study of Latinos (2008-2011). A positive residual would indicate favorable early-life conditions promoting growth, while a negative residual indicates early-life adversity that may stunt the development. The height residual was associated with early-life variables such as parental education, year of birth, US nativity and age at first migration to the US (50 states/DC), supporting the validity of height residual as a surrogate for early-life conditions. Furthermore, a height residual was positively associated with better cardiovascular health (CVH) and cognitive function among middle-aged and older adults. Interestingly, among <35 years old, the height residual was negatively associated with the "Lifes Essential 8" clinical CVH scores. These results suggest the non-genetic component of height as a surrogate for childhood environment, with predictive value for CVH and cognitive function.

20
Time to diagnosis among children and adolescents with cancer in Quebec, Canada: a population-based study

Mullen, C.; Barr, R. D.; Strumpf, E.; El-Zein, M.; Franco, E. L.; Malagon, T.

2026-04-13 epidemiology 10.64898/2026.04.09.26350491 medRxiv
Top 17%
0.0%
Show abstract

BackgroundTimely cancer diagnosis in children and adolescents is critical to improving outcomes, yet substantial variation in diagnostic intervals persists across cancer types and care settings. We aimed to quantify time to diagnosis and assess variations by patient, demographic, and system-level factors. MethodsWe conducted a retrospective population-based study of children and adolescents aged 0-19 years diagnosed with one of 12 common cancers between 2010 and 2022 in Quebec, Canada. The diagnostic interval was defined as the time from first cancer-related healthcare encounter to diagnosis. We calculated medians and interquartile ranges (IQR) overall and by cancer type and used multivariable quantile regression to identify factors associated with time to diagnosis at the 25th, 50th, and 75th percentiles. ResultsAmong 2,927 individuals with cancer, diagnostic intervals varied by cancer type and age. Median intervals were longest for carcinomas (100 days; IQR 33-192) and shortest for leukemias (8 days; IQR 3-44). Compared with children living in Montreal, living in regional areas and other large urban centres was associated with longer 50th and 75th percentiles of time to diagnosis for hepatic and central nervous system (CNS) tumours. Diagnostic intervals were shorter in the post-pandemic period (2020-2022) across several cancer sites, with CNS tumours showing reductions across all quantiles. InterpretationDiagnostic timeliness differed by cancer type, age, and rurality, but not by sex, material, or social deprivation. The shorter diagnostic intervals observed in the post-pandemic period suggest that pandemic-related changes in care pathways may have expedited diagnosis for some cancers.